Multiplication of Matrices of ArbitraryShape on a Data

نویسندگان

  • Kapil K. Mathur
  • S. Lennart Johnsson
  • Lennart Johnsson
چکیده

Some level{2 and level{3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM{200 are described. No assumption is made on the shape or size of the operands. For matrix{matrix multiplication , both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in{place is described in detail. We show that a level{3 DBLAS yields better performance than a level{2 DBLAS. On the Connection Machine system CM{200, blocking yields a performance improvement by a factor of up to three over level{2 DBLAS. For certain matrix shapes the systolic algorithms ooer both improved performance and signiicantly reduced temporary storage requirements compared to the nonsystolic block algorithms. We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix{matrix multiplication. Furthermore, it is shown both analytically and experimentally that the optimum shape of the processor array yields square stationary submatrices in each processor, i.e., the ratio between the length of the axes of the processing array must be the same as the ratio between the corresponding axes of the stationary matrix. The optimum processor array shape may yield a factor of ve performance enhancement for the multiplication of square matrices. For rectangular matrices a factor of 30 improvement was observed for an optimum processor array shape compared to a poorly chosen processor array shape.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Electro-spunorganic nanofibers elaboration process investigations using BPs operational matrices

In this paper operational matrix of Bernstein Polynomials (BPs) is used to solve Bratu equation. This nonlinear equation appears in the particular elecotrospun nanofibers fabrication process framework. Elecotrospun organic nanofibers have been used for a large variety of filtration applications such as in non-woven and filtration industries. By using operational matrix of fractional integration...

متن کامل

Distributed General Matrix Multiply and Add for a 2D Mesh Processor Network

A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation GEMM, i.e., general matrix multiply and add, is presented. With the same functionality we mean the ability to perform GEMM operations on arbitrary subarrays of the matrices involved. The logical network is a 2D square mesh with torus connec-tivity. The matrices involved are distributed with non-sc...

متن کامل

A note on primary-like submodules of multiplication modules

Primary-like and weakly primary-like submodules are two new generalizations of primary ideals from rings to modules. In fact, the class of primary-like submodules of a module lie between primary submodules and weakly primary-like submodules properly.  In this note, we show that these three classes coincide when their elements are submodules of a multiplication module and satisfy the primeful pr...

متن کامل

Data confidentiality in cloud-based pervasive system

Data con€dentiality and privacy is a serious concern in pervasive systems where cloud computing is used to process huge amount of data such as matrix multiplications typically used in HPC. Due to limited processing capabilities, smart devices need to rely on cloud servers for heavy-duty computations such as matrix multiplication. Conventional security mechanisms such as public key encryption is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992